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ABSTRACT 



The theory of stochastic processes iJedls with systems 
that develop over time in accordance with probabilistic laws. The 
basic concepts involved in two types of continuous-- time processes are 
the idea of a constant probability of- occurrence in the point event 
process and the extensions necessary for the discrete state process. . 
The required mathematical skills and technical literature in this are 
discussed. It is recommended that researchers responsibly for 
collecting longitudinal data change their method from a reference 
point to an event history approach to item construction. The 
difference between the two approaches is illustrated with sample 
questions from two longitudinal studies sponsored by the National 
Center for Education Statistics. (Author/PN) ^ 
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As people with a special Interest in longitudinal studies, we are very 
concerned about how to study development and change over time. Today I will 
Introduce some developments In the statistics of stochastic processes that 
make them particularly suitable for use with longitudinal data In education 
and other social sciences. 

Some of you may remember reading papers In graduate school about Markov 
chain models for occupational mobility, Industrial mobility, or educational 
progression. In such a paper you would have seen, for example, a table with 
the number of students enrolled In various kinds of schools In one year cross- 
classified with those enrolled In the next year. You then would have seen 
some equations with an unfamiliar mathematical form. The equations would 
somehow be applied to the table and projections of the future distributions of 
students among schools would be displayed. Such papers were Infrequent be- 
cause the results were not Impressive. The projections were clearly Inaccu- 
rate afer a fairly short Interval; the model applied to the system as a whole, 
with no way to compare subgroups; and the Ideas about why the world would 
behave like a Markov chain were unconvincing. A Markov chain Is one kind of 
stochastic process, but we were right not to take that kind of work seriously 
In the social sciences. 

Over the past ten years, however, a number of new developments In statis- 
tics and in sociology have drastically changed this picture. One change has 
been from a stochastic process In which time progresses In discrete jumps, to 
a model with a continuous flow of time. This change Improved the realism of 
stochastic process models somewhat. In that changes would now take place at 
times that don't coincide with t\e observation times. But the major change 



toward realism occurred with the development of multivariate methods and 
* estimation techniques that model a world in which different kinds of people 
can change and develop in different ways at different speeds, 

Tuma, Hannan, and Groeneveld (1979) reported from their research on 
effects of public welfare programs on family stability an example of just how 
successful a continuous-time, multivariate stochastic process model can be, by 
comparing its results with results from the more well-known multiple regres- 
sion approach. Using data from the Seattle/Denver Income liaintenance Experi- 
ment, they estimated a continuous-time, stochastic process model in which 
female rates of marriage, marital dissolution, and attrition from the study 
were multivariate functions of welfare support levels, normal income, prior 
AFDC enrollment, children, age, education, and wage rates. In order to com- 
pare the prediction errors with a multiple regression analysis, they looked at 
seven outcome measures, (being married, single, or lost to the study, being 
continously married or single, and the number of marriages and dissolutions) 
at one year and at two years after the start of the experiment, regressed on 
initial marital status and the same other causal variables used in their 
stochastic process model. Even though their dynamic model was constrained to 
fit the entire time period, while the regression equations were free to fit 
each time point and outcome variable separately, they were surprised to find 
that for 10 of the l4 outcomes their model explained more of the sample 
variation than did linear regression analysis (1979, p, 841), Unlike multiple 
regression, however, stochastic process models predict the time path of change 
and development. 

This example provides evidence of the claim I am making here today: that 
the statistics of stochastic processes for the social sciences have 
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developed to the point that those of us who work with longitudinal data need 
to make a serious effort to become acquainted with them. The rest of my paper 
covers three topics that are intended to aid this acquaintance. First, I in- 
troduce a few of the basic ideas of stochastic processes; second, I point to 
the essential reference papers describing the new methods and give some advice 
on how to read them; and third, I wake a plea for some changes in the way 
longitudinal data are often collected, in order to make them more suitable to 
the new statistics. 

A FEW BASIC IDEAS OF STOCHASTIC PROCESSES 

The theory of stochastic processes deals with systems that develop over 
time in accordance with probabilistic laws. There are two kinds of continous- 
time processes of interest; The point event process in which events happen 
but no changes in state occur (the name comes from its representation of 
events as points on a time line), and the discrete state process in which a 
unit can change from one to another among a set of categories (because the 
possible values are categorical, the set of states is discrete; were the units 
free to take on any value, the .state space would be continuous). 

The point event process . I begin with the point event process because it 
is simpler. This kind of process could describe, for example, telephone calls 
arriving at a switch board, persons arriving at a line in a bank, or traffic 
accidents on a local highway. For this model, I consider the calls, the 
people, and the accidents to be independently generated and interchangabie. 

The chance that more than one event happens in a small time interval can 
be represented by a general function, 0(^l), representing th6 rate at which 
multiple events occur, where is the time interval. By assumption, this 
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function gets small much more quickly than the time Interval, ^t. This 
assumption means that If we could measure time accurately enough, events occur 
one at a time, because this term goes to zero as ^t goes to the limit. 

If the calls or people or accidents have a constant probability of arriving, 
with no trend or periodicity, then the chance that the event happens exactly 
once In a time Interval, ^, has two parts: r ^t + 0(^t), where r Is the 
constant rate of occurrence. The chance that no event occurs Is one minus the 
other two possibilities; 1 - r^t + 0(^t). (The coefficient and sign of the 
0(^t) term do not matter, because whatever they are, the term vanishes later). 
By assuming that events happen randomly with respect to time, the chance that 
an event happens- In the next time Interval Is Independent of any occurrences 
in previous Intervals. These assumptions define what Is called a Polsson 
process. 

While the probability of an event Is assumed to be constant with respect 
to time, time Intervals between events are not expected to be constant; long 
Intervals are much less likely to occur than short Intervals. The distribu- 
tion of durations between events can be derived from the above assumptions In 
a few steps. Let T be a random variable representing the duration until the 
first event after the starting time, and let P(t) be the probability that T Is 
greater than an arbitrary time point, t: 
P(t) = Pr(T>t). 
Then for > 0 

P (t + At) = > t + At) 

= Pr (T > t and no event In the Interval (t, t + At)) 
=■ Pr (T > t) Pr (no event In (t, t + At) )' T > t) 
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The Independence assumption means that this conditional probability Is un^f- 
fected by the condition, T > t, which refers to what happened before t, and 
the probability of no event In the interval, (t, t + ^t) Is 1 - r^t + 0(^t). 
Therefore, 

P (t + At) - P (t) (1 - r At + 0(At)). 
The derivative of the duration probability with respect to time can now be 

written as 

dP(t) « lim P(t +At) - P(t) « 11m P(t) -rAt P(t) + 0(At) P(t) - P(t) 
dt At^O At At-^0 At , » 

- lim -rAt P(t) + 0(At) P(t ) 
At--0 At 

Since 0(At) goes to zero more quickly than At> the second term disappears in 
the limit, and At divides out of the first term. What is left is a differen- 
tial equation, 

dP(t) ^, . 

» -rP(t) 

that can be solved by first separating terms, 

dP(t) 
P(t) = 

and then integrating both sides 

J p(t) = -'J'^t. 

Using the facts that Jdx = x and Ji^ = In u, this becomes 
In P(t) = -rt. 

Exponentiating both sides gives the exponential form (so typical of stochas- 
tic processes) for the likelihood of a given duration between events: 

P(t) = e -rt. 

I presented this derivation to illustrate several points. First, a very 
simple model, in which the chance of an event occuring Is constant over time, 
generated this exponential form. With more complex models, of course, the 



equations get more complicated, but the exponential form Is very common. 
Second, the kind of mathematical tools that you need are somewhat different 
from the ones you use with linear models. You need to be able to work with 
exponentials, logarithms, and some calculus. Third, the rate parameter, r, 
governs how rapidly events occur. As the rate Increases, the time between 
events decreases. In fact, the mean duration between events Is 1/r. Rates of 
occurrence are bounded by zero at the lower end (even extremely long durations 
between events have a positive rate) and by a value of ten or fifteen at the 
upper end (beyond these values the durations between events get too small to 
measure). 

In trying to understand how these processes work. It Is very useful to 
simulate some cases, or "realizations". With discrete-time models, each step 
proceeds In constant jumps, but In continuous-time models, the time between 
events Is variable. From the relationship, P(t) = e "'t, the time intervals 
can be derived. A random number generator U, that Is uniform In the (0,1) 
interval, can provide time Intervals that are equally likely, so 
P(t) = e = U 
-rt - In U 

t = (In U)/r.. 

A simulation Is useful for generating "data" from a model to help understand ■ 
how the model works. Figure 1 (from Cox and Miller, 1965, p. 7) shows two 
realizations of this simple point-event process with the constant rate, r 
equal to 2. 

(() {■ H f t — P M—H w X K 
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FIGURE 1 
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An example of prbgrammlng a simulation of a related continuous process can be 
. found in Knott (1981). ^ 

It is also important to be able to go in the other direction, from having 
data to obtaining an estimate of the rate parameter. A straightforward 
estimate in this case is the total number of events divided by the total time 
the process was observed. A general problem arises with real data, however, 
in that the last observable time period ii5 necessarily truncated. In Figure 
2, the period of observation lies between the two vertical lines extending 
below the time axis at 0 and T, The times at which events occur are marked, 
as before, with an x. But the existence 

I H Hm nm H > i! | ^ 

FIGURE 2 

of the last event on the right is only assumed, since the event had not yet 
occurred at the observation time T. Whenever the last observed period does 
not end with an event, the period is said to be "right-censored." If the 
first period does not begin at the beginning of the process or with an event, 
the record is left-censored. Censoring Is a characteristic of an observation 
, plan, not the process itself. S/rensen (1977) and Tuma and Hannan (1978) 
discuss how to use data from censored time periods in such a way as to avoid 
]5)ias in the estimators. Censoring is particularly important in social science 
research because longitudinal studies generally have sh9rt observation periods 
in relation to the rates of change in the outcomes of Interest. 

The point event model discussed so far needs to be extended for social 
science use in a way that permits different people to have different rates of 
change. This is done by letting the rate parameter, r, be a multivariate 
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function of background variables. In one of the earliest pieces of social 
science research of this kind*, Nancy Tuma (1976) modelled the rate of leaving 
a job as a multivariate linear 'function of job rewards and personal resour- 
ces*** 

Tuma (1976, pp. 3A2-3) discusses the conceptual advantage of modelling 
directly the rate of change parameter, rather than Its observable consequen- 
ces. While the tran^sltlon rate Is not Itself observable, several observables 
can be derived from It: whether or not an event occurs by a given point In 
time, the duration of the between events, and the number of events In a given 
period of time. These three observables have each been modelled as a linear 
function of background variables. Tuma shov/ed mathematically that the three 
types of variables cannot simultaneously be linear functions of background 
variables; If any one Is a linear function, the other two can not be linear 
functions. Yet If the rate parameter Is made a linear function of the back- 
ground variables, values of the three observables can be derived simultaneous- 
ly from a single model. The f:lrst goal of our study of stochastic processes 
should be to understand the nature and mathematical properties of the rate 
parameters. 



* An earlier paper by S/^rensen (1975) modelled job shifts as a function of 
single variable, race. 

** A linear function has the undesirable property, however, of occasionally 
predicting negative rates of job mobility, which are mathematically undefined. 
To deal with this problem, Tuma lat6r developed a method for constraining the 
rate parameter to -positive values with a log-linear function: 
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The discrete space process * While the discrete space process Is more 
complex than the point event process, there are many similarities. This kind 
of process could describe, for example, a person changing from employed to un- 
employed to out of the labor force; a^ person changing from single to married 
to dropped out of the study; or a person changing from junior college to 
college to dropped out of school. 

A discrete state process can model such changes in status. If Y(t) is a 
random variable whose value is the state occupied by a unit at time t, then 
for any two time points, u and t (t < u)., and for any two states at these time 
points, one can define a transition probability, the likelihood that a unit 
will occupy state k at time u, given that it occupied state j at time t: 
pjk(t,u) « Pr[Y(u) = k I Y(t) « j]. 

Since in this model time is continous, there is no advantage in arbitrarily 
choosing any two particular time points at which to examine this set of tran- 
sition probabilities. The only way to handle such transitions in a general 
way is to rely on the assumption of constant rates of change, as in the point 
event processes, and show how the transition probabilities between any two 
points in time depend on these underlying rate parameters. 

If one assumes that /the probability of a transition occurring in a sman 
time Interval (t, t + /j^t) depends only on the state occupied at time t and not 
on any of the states occupied' previously , then transition probabilities over 
long time Intervals can be built up recursively from transition probabilities 
over shorter time intervals. Letting P(t,u) be a matrix whose elements are 
the Pjk (t,u) above, this assumption means that for three time points, 

P(s,u) - P(s,t) P(t,u), s < t < u. 
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By letting the difference between t and u approach zero, this relationship can 
be used to define the derivative of the transition probabilities with respect 
to time; ^; 

MlilL^ P(s,t)R, 
dt 

where R Is a matrix of constant rate parameters similar to that of the point 
event process. As before, the solution of this differential equation results 
In a equation with an exponential form, 

P(s,t) « eR(t-s), 

that expressed the transition probabilities between any two time points as a 
function of the rate parameters and the elapsed time. * 

Sometimes the analyst Is also Interested In the proportions of people (or 
units) occupying each state over time.' Letting, the vector, P(t), represent 
the proportion of people In each state at time t, this relationship can be 
used to express the state probabilities as a function of the Initial distribu- 
tion, the rate parameters, and the elapsed time: 

P(t) « P(0) P(0,t) - P(0)e^^. 
A graph displaying an example of this function Is contained In Tuma, Hannan, 
and Groeneveld (1979, p. 8 A3), In which the authors plot the observed and the 
predicted proportions of married welfare mothers over a two year period. 



* Raising the constant e to a matrix power is probably an unfamiliar operation 
for most people. In fact, a function of a square matrix can be expressed as 
the eigenvectors of the matrix times that function of Its eigenvalues. More 
specifically, for the matrix A, If A « B C B"^ where C Is a diagonal matrix 
of eigenvalues and the columns of B are their corresponding eigenvectors, then 
f(A) - Bf(C)B"^- the case of this matrix exponential, this means 

eRt . B dlag (e c« ^ ... , e'^K^)^'^. 
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The rate parameter of the point event process Is In some respects similar 
to and In some ways different from the transition rates of the discrete space 
process. As before, the elements of. the matrix of transition rates, r^j, 
must be positive numbers greater than zero (and. In practical terms, less than 
about 10 or 15), with this exception: the diagonal elements are negative. 
The exponent In the equation for the point event process had a negative sign; 
In the discrete state process, this negative sign applies only to the diagonal 
elements, so It does not appear In the equation. The elements of the rate 
'matrix are not all independent parameters. The diagonal elements are equal 
to, but opposite In sign from the sum of the other row elements: 

The duration of time spent In each state can differ from one state to another, 
but In each case the average duration Is l/^jj- addition, the ratio 

rji^/rjj Is the conditional probability that a change Is to state k, given 
that a change out of state j occurs. 

While the details of the above equations may not be well understood by 
those new to stochastic : processes. It Is Important to understand that these 
transition rates are thk essential quantities of Interest In dealing with 
development and change that happens contlnously over time, because these 
quantities describe and govern the time path of changes. From the transition 
rates a number of observable characterstlcs can be derived exactly: the 
probability of occupying a given state at any given point In time, the 
expected duration In a state, and the number of changes In a given Interval. 

As In the case of the point event process, the discrete state process 
needs to be extended for use In the social sclenfees In a way that permits 
different kinds of people to develop In different ways. To Introduce causal 
relationships, the dependence of the transition rates, on the observable 
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variables can be specified in either of two ways. First, when the decision to 
leave the current state can be separated from the choice of destination, one 
can model the rate of leaving (the inverse of the duration) and the condition- 
al destination rates: 

and 

Alternatively, if the rates of moving (the off-diagonal matrix elements) are 
not conceptually separate from the rates of staying (the diagonal matrix 
elements), a log-linear decomposition (to constrain the predicted rates to be 
positive) of all but the diagonal elements would be appropriate: 

It is because recent developments made this extension possible that we now can 
use this powerful class of statistical methods for longitudinal research in 
the social sciences. 

I have a few general comments about estimation methods. First, unlike 
the old Markov chain models, the rate parameters are estimated not from 
grouped data in cross-tabulations, but from individual data from unit records. 
Second, maximum-likelihood or partial-likelihood techniques are used that have 
good properties even in small samples with a fair amount of censoring (Tuma 
and Hannan, 1978). These techniques make possible estimates of the standard 
errors for statistical tests of the signifigance of the coefficients express- 
ing the dependence of the transition rates on the causal variables. Third, 
while the transition rates are unobservable, there need be no mystery to the 
estimation methods. As in the case of point events, the computations derive 

* 

from the total number of transitions of each kind divided by the total amount 
of timie a person was fexposed to the possibility of such a transition. Finally, 
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computer programs are .currently available, though not as a part of SPSS or 
SAS, Nancy Tuma 's program (1980) is available for a small fee from Nancy 
Tuma directly at Stanford University. James Coleman (1981) has also publish- 
ed programs for this kind of work. While I have not examined Coleman's pro- 
grams in detail, my first Impression Is that Tuman*s program Is more friendly 
to the user. 



LEARNING TO USE STOCHASTIC PROCESSES 

Those of us who are persuaded that stochastic process models may be 
useful In their research and would like to learn to use them are faced with a 
difficult choice. In a way we are like potential buyers of a home microcom- 
puter or an office word processor, asking ourselves whether we can afford to 
pass up In^medlate benefits to wait for the lower costs and Improved perform- 
ance that are likely a few years from now. Unless we need to use stochastic 
processes right now In our research (and members of this group are likely to 
have this need), there may be some advantage to waiting a few years while 
keeping an eye on further developments In this area. On the other hand, it 
took me longer to learn these methods than I expected, because the necessary 
mathematical skills are different from what I had been using, so I needed 
remedial work. 

For those who want to start the process now, I offer this observation 
from my own experience. The first section of this paper Illustrates my point 
that the mathematical skills required are more advanced than those required to 
use path analysis and structural equation models. One must be sufficiently 
exposed to elementary probability and statistics, matrix algebra, calculus, 
and differential equations to understand the basic concepts of stochastic 
processes. Unfortunately, there is as yet no introductory, graduate level 



social science textbook for continuous-time, multivariate stochastic process- 
es. Such a textbook in econometrics, for example, often contains a chapter on 
the basic results in matrix algebra that one needs to understand econometrics. 
The absence of such a well focussed textbook means learning some unneeded 
skills. In addition, the absence of such a textbook means we have to learn by 
reading a variety of technical journal articles that assume a generally higher 
level of mathematical sophistication than would be necessary for those who 
wish simply to apply these|:ecniques and not necessarily to contribute to 
their development. 

I recommend that you begin your study with Hannan and Tuma's (1979) non- 
technical overview of methods for temporal analysis, and then proceed to some 
of the longitudinal research applications that have already begun to appear in 
the literature, in which the focus is on substance, with minimal attention on 
statistical technic|ue. With these papers, you can set a minimal standard for 
what you need to learn. Here are five examples of longitudinal r- search 
applicatilns: Hannan, Tuma, and Groenveld (1977) used this method to explain 
the effects of experimental negative Income tax welfare programs on marital 
stability. Hannan and Carroll (1981) used this method to explain the effects 
of population, ethnic diversity, and gross national product on changes in the 
forms of government of the nations of the world. S^^rensen and Tuma (1981) 
used this method to model the effects of ability, educational attainment, 
wages, and occupational standing on upward, downward, and lateral job changes. 
Rosenfeld (1981) used this method to model the ways in which career history, 
individual characteristics, and age affect transition rates between different 
types of jobs for men and women with advanced training in psychology. Felmlee 
(1982) used this method to model the effects of sex, job rewards, individual 
resources, social constraints, and age on rates of job changes within and 
between employers. 
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To acquire a knowledge of the formal assumptions, you need to study a 
textbook on stochastic processes, such as Cox and Miller (1965. Introduction 
and Ch. A). Brleman (1973). Feller (1968. Ch. 17). Karlln and Taylor (1975), 
or perhaps Bartholomew (1973). One of these textbooks must be consulted, 
because none of the technical journal articles takes the space needed to 
explain how to get from the Markov assumption to the solution of the forward 
or backward Chapman-Kolmogorov differential equations. 

To acquire an understanding of the properties of the matrix of transition 
intensities, the work of Singer and Spllerman (197A. 1976a. 1976b. 1978) Is 
very useful. While these papers explain the advantages to changing from dis- 
crete-time Markov chains to continuous-time Markov processes and definitively 
treat the Issues of embeddablllty and Identification, this work Is not prac- 
tical for multivariate analysis because It treats only grouped data In cross- 
tabulations. 

To learn how to analyze ungrouped longitudinal data with continuous- time 
stochastic processes. It makes sense to start with point event processes 
before moving to discrete state processes, as 1 did In the p^cedlng section. 
S^rfrensen (1975) and Tuma (1976) discuss such models, and S/rehsen (1977) and 
Tuma and Hannan (1978) explain what Is to be done with censored data. The 
most important reference for discrete state models Is the paper by Tuma, 
Hannan. and Groeneveld (1979), though It Is probably too difficult for a 
beginner to follow. Tuman and Hannan should soon have a book out on the sub- 
ject, and Coleman (1981) has recently published a book on continuous-time, 
discrete-state stochastic processes. While 1 have not yet finished the new 
Coleman book, so far 1 can repprt that It makes demands on mathematical skills 
that most beginners cannot meet. Issues of estimation are treated In Coleman 
(1981. Ch. 6), m Tuma, Hannan, and Groeneveld (1979), and In Tuma (1980). 
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For the past few Bummers, the National Opinion Research Center has been 
sponsoring, with some support from ray office at NCES and the Labor Depart- 
ment, a summer slRnrt^cpurse In methods for analysis of longitudinal data, 

, "J 
including the class of models I have discussed here. There are no other' 

classes outside of the graduate schools of which 1 am aware. 

COLLECTING LONGITUDINAL DATA 

The next section of my paper is addressed to all those who collect longi- 
tudinal data, including those who want to learn these methods later, or per- 
haps leave this kind of analysis to the next generation of graduate students. 
Tfi±s new class of methods requires slightly different questions in longitudin- 
al surveys; it requires retrospective data on the timing of changes. The 
statistical estimates of the parameters of these models require data on the 
number of events of each type that occur to each person and the duration of 
time that each person was exposed to the possibility of each event. For 
example, these models need the dates people start and stop working at each 
job, the dates that people enter and leave each school they have attended, the 
dates of marriages and marital separations, and the dates of military service. 
If the intervals between survey waves are not long in comparison with the 
rates at which the events under study occur, questions of this type are not 
more space-consuming or burdensome than asking the respondents about their 
status at several different reference points. If the survey asks about each 
job, or each spell of schooling, then the analyst can assume there are no 
changes other than those listed by the respondents, and, consequently, one has 
full knowledge of the respondents' status at every point in time. 

The use of stochastic process models in analysis requires a change in 
approach to collecting data, from asking about reference points to asking 
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complete M I can be more specific about this change by using as 

examples selected questionnaire Items from NCES's two longitudinal studies, 
the National Longitudinal Study of the High School Class of 1972 (NLS-72) and 
the High School and Beyond (HS&B) study. The appendix to this paper contains 
selected education Items from four follow-up surveys, the last three follow- 
ups from the NLS-72 and the first follow-up from HS&B. 

The NLS-72 second follow-up needed to cover only a single year since the 
first follow-up, so It asked about school attendance In October 1974, a refer- 
ence point, and other schools from October 19,73 to October 1974, with begin- 
ning and ending dates of each. This was quite a reasonable and effective ap- 
proach, provided that the respondent had not attended more than two schools In 
that period, and that the respondent did not drop out and then re-enter either 
school. In the latter case, the answer to "when did you first attend this 
school? would not properly describe the episodes of schooling. 

The NLS-72 third follow-up needed to cover two years since the second- 
follow-up and. In order to ask about each reference point and to keep the 
questionnaire under a one-hour maximum required for EDAC clearance, the ques- 
tion about other schools at other times was deleted. So the respondent could 
report attending up to only two schools In this period, both of which had to 
be In October. This resulted in the net being too broad, especially for 
vocational students, for whom I estlmatd that the average duration of school- 
ing was about four-fifths of a year.* 



* Using methods described in Singer and Spllerraan (1976b), and using NLS-72 
data to construct turnover tables between 4-year schools, 2-year schools, 
vocational schools and non-enrollment, for three pairs of the years 1972, 
1973, and 1974, I extracted three continous-tlme transition intensity 
matrices. The Inverse of the diagonal element for vocational school was about 
©•8 years In each case. The results of this exercise were not published. 



Another piece of evidence that suggests under-reporting of the flow of 
students through the postsecondary vocational sector is the unexpectedly high 
cummulative proportion of the NLS-72 cohort earning certificates and licen- 
ses (26.7 percent. See Kolstad, 1981). In addition, the date of leaving the 
school attended in October 1975 was not asked. 

The NLS-72 fourth follow-up needed to cover three years since the third 
follow-up and the deficiencies of the reference point approach had become 
apparent. The item was changed to ask about school attendance ^Lr each month 
In each of the three years, but the school name was requested only for the 
last month attended, thus permitting good data on the October reference 
months. Unfortunately, the correspondence between this school and the prior 
months of attendance is tenuous, especially when distinct spells of attendance 
were reported. 

The HS&B first follow-up needed to cover two years since high school. 
This questionnaire abandoned all attempts to maintain October reference 
points, and asked about up to five different schools, even permitting simul- 
taneous attendance in two schools.* When the HS&B first follpw-up data be- 
come available in the next year we may begin to see event history analyses of 
student flows through the school system, because this is how the questions 
need to be asked in order to use stochastic process models of the kind I have 
discussed. I should point out that the National Longitudinal Survey of Young 
Americans, sponsored by the Labor Department, has also changed from a refer- 
ence point to a retrospective history approach to labor market experience 
items. 



* This may happen, for example, in training for the nursing profession; the 
student may be trained in a teaching hospital while taking academic courses in 
a junior college. 
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Once the data are collected in the form of a variable number of episodes, 
there are some choices to be made about how to structure the resulting data 
files. Blum, Karwelt, and S/rensen (1969), Karwelt (1973), and Ramsoy and 
Clarkson (1977) discuss the advantages of variable-length records for data 
storage, though standard packages like\SPSS may have problems with this 
method. 

I have argued in this section that longitudinal researchers, even if they 
do not themselves intend to learn and use stochastic process models, would be 
well-advised to collect their data in a retrospective event history form 
rather than a reference point form. It is time to discuss, briefly, what can 
be done with reference point data (sometimes known as panel data). This kind 
of data o'zcurs in two situations: when the data were unfortunately collected 
the wrong way, or when the data could not be collected in the right way. The 
latter occurs when respondents are unaware of changes, or prior states are 
subject to gross recall errors. Some examples are prior attitudes, such as 
voting Intentions or self-esteem, prior capacities, such as reading compre- 
hension, or prior internal states, such as malaria parasites (Cohen and 
Singer, 1979). In these cases, the capacity for detailed multivariate 
analysis is much more limited. Coleman (1981, Ch. 4) addresses several topics 
inr this area, using techniques for individual level data, while Coleman and 
Singer (1979) approach this problem from the more limited possibilities of 
group level data. 
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SUMMARY 



In this paper I have argued that stochastic process models have recently 
advanced to the point that they have become useful for longitudinal research 
in the social sciences. I presented a few of the basic concepts involved in 
two types of continuous-time processes: the idea of a constant probability of 
occurance in the point event process and the extensions necessary for the 
discrete state process. In order to help those interested in learning more 
about this class of techniques, I presented some observations from my own 
experience on the required, level of mathematical skills and some guidance to 
the technical literature in this area. In order to permit more analysis of 
this kind, I asked those researchers responsible for collecting longitudinal 
data to change their methods from a reference point to an event history 
approach to item construction. I illustrated the difference between the two 
approaches with sample questions from tv/o major longitudinal studies sponsored 
by the National Center for Education Statistics. 
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NLS-72 SECOND FOLLOW-UP SURVEY, 1974-75 



I SCHOOL ATTENDANCE FROM OCTOBER 1f73 THROUGH OCTOBER 1f74 



\ 



community colltge. •nd to forth? " 



No.. 
Yes 



.1 COTOQ.5S.P. 10 
.2 COTOQ. 10 ¥ 



10. Did you atttnd iclwol in tlit <ir«t of October If 74? 

No 1 GOTOQ.32.P 7 

Yes 2 GO TO Q. a 



U. What i. th. .K.ct n.m. .nd location of tiio school you w.rt ottondlng in tho fir^tw^ oL?^^ (Pl"s« 
print and do not abbrtviatt.) 



School Name: 

City: 



State: 



14. Whon did vou fint aHtnd this school? 



1$. Aro you currently attending this school? 

Yes 1 

No 2 Date left :_ 



(month) 



(month) 



(year) 



(year) 



ATTENDANCE AT 



OTHER SCHOOLS FROM OCTOBER 1973 TO OCTOBER 1974 



32 Besides any schools you may 

OTHER schools froQLOctobeiJSZl 
academies, business schools, trade 
forth.) 

No 



Yes 



,1 GO TOO next fnii:i 
.2 GO lOV '^'^ 



ion of this school? Please print and do not abbreviate. (If you 



attended more 



School Name: 
City: 



3Sa. When did you first aHend this school?^ 



State: 



(month) 



(year) 



ERLC 



3Sb. Art you now aMonding thii tChool? 

Ves 1 

No 2 Date left: 



(month) 
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(year) 



NLS-72 THIRD FOLLOW-UP SURVEY, 1976-77 



. lehZ ^.S .cLl, t.ehnie.l institut., ve.tion.l .ehool. community eoltage, and so forth? 

No 1 GOTOQ.98.P. 22 

Yes 2 CONTINUEW1THQ.52 



[ 



SCHOOL ATTENt)ANCE IN OCTOBER 1974 



52. Did you •ttend school in tho f int wtek of Octobtr 1f76? 

No 1 GO TO Q. 66, p. 15 

Yes . 2 CONTINUEWITHQ.53 

53. What is the CKad name and location of the school you were attending in the first week of October 1976? (Please 
print and cb not abbreviate.) 



School Name : 
City: 



State: 



55. When did you first attend this schoot? 



(month! 



(year) 



$6. Are you currently attending this school? 



Yes 

No . . 



.2 Date left: 



(month) 



(veaD 



[ SCHOOL ATTENDANCE IN OaOBERmsJ 



M. Now please ttiinV 
Octotwr 1975? 

No 



back to Fall 1975. Were you taking classes or courses 



at any school during the month of 



.1 GOTOQ. 7V.p. 17 



Yes. at the same school I attended in October 1976 and ^ ^^^^ 

reported above in Q. 53 ^ cOSTISL i: \MTH Q. 67 

Yes. at a school 1 have not yet reported 



«. What is the exact name and tecation of the school you were 
do not abbreviate.) 



attending in October 1975? (Please print and 



School Name: 
City: 



State: 



M. When did you first attend this school? 



(month) 



(vear) 



ERLC 
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NLS-72 FOURTH FOLLOW-UP SURVEY, 1979-80 



SCHOOL ATTENDANCE DURING THE PERIOD FROM THE 
FIRST OF NOVEMBER 1976 THROUGH OCTOBER 1979 



78. Durtng th* IhrM-yMr ptrlod from tht flr>t of Novtmber 1976 through October 1979 , were you enrolled In or did 
you take classes at any school such as a college or university, graduate or professional school, service aca- 
demy or school, business school, trade school, technical Institute, vocational school, community college, and 
•o forth? 

(Circle one.) 

No 1 GOTOQ,l,UfK30 

Yes 2 COSTIM T W ITH Q. 79. p. 19 



SCHOOL ATTENDANCE DURING THE PERIOD FROM THE 
FIRST OF NOVEMBER 1978 THROUGH OCTOBER 1979 



79. During the period from the first of November 1978 through October 1979 , were you enrolled In or did you take 
classes at any school such as a college or university, graduate or professional school, service academy or 
school, business school, trade school, community college, and so forth? 
(Circle one.) 

NO 1 aOTOQ.9hp.2l 

COMIM E W ITH Q. 80 



80. During the period from the first of November 1978 through October 1979 , which month(s) did you attend 
school? 

(Circle all that apply.) 

November 1978 1 

December 1978 2 

Jariuary 1979 3 

February 1 979 ^ 

March 1979 5 

April 1979 6 

May 1979 7 

June 1979 8 

July 1979 9 

August 1979 10 

September 1979 11 

October 1979 12 

81 . What Is the exact name and location of the school you attended the last month that you circled In Q. 80? 

School name: — 

City: •" State:—, 

NOTE: ' Two similar blocks of items were asked for school attendance ^^J^ng th| 
periods November 1977 to October 1978 and November 1976 to October 1977. 
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HS&B FIRST FOLLOW-UP SURVEY, 1982 



31. 



32. 



Between the time you left high school and the end of February 1982. »»*ve you 
enroled to or ™d you take cU..e. at any achool such a. college or un.ver..ty. 
iJIduate^r pr?;:^ -chool -ervice academy or .chool busine.. -c^ooUr.de 
So" technical institute, vocational school, eo^munity college, and so forth? (Do 
not include Armed Forces training programs.) (MARK ONE) ^ 

y** !...........!....c (SKIP TO Q. 50) 

No... 

Which month, were you enrolled in or taking «J""«;° 

you left high school and the end of February 1982? (MARK ALL THAI Af fLi ) 



1980 



1981 



1982 



33. 



June .c January -July c January ^ 

July .c February c August • - February c 

August .c March - September c 

September .c April - October c 

October .c May November - 

November .c June - December - 

December 

Next we would like information about all of the schools you have gone 

left high school. Please start with the first school you went to after -^^J^^ 

Answer questions A-K for that school in the first column (pages 16 and 18). then 

answer questions A-K for the second school in the next column, and so on. 

(BE SURE TO INCLUDE YOUR CURRENT SCHOOL.) 

If you attended two schools at the same time, please put them in separate 
columns. 



33. Continued. 



COLUMN 1 
1ST SCHOOL AFTER HIGH SCHOOL 



COLUMN 2 
2ND SCHOOL AFTER HIGH SCHOOL 



A) What is the exact NAME and 
LOCATION of the school? 
(WRITE IN) 



C) When did you START 

attending this school (MARK 
OVALS FOR MONTH and 
YEAR) 



School name:. 

Address: 

City: 

State: 



Month 



D) When did you LEAVE this 
school? (MARK OVALS FOR 
MONTH and YEAR) 



ERLC 



: Jan. 
cFeb. 
c March 
o April 



May 
oJune 
cJuly 
-Aug. 

Year 

11980 
C1981 
-1982 



1 Sept. 
3 Oct. 

Nov. 
c Dec. 



School name: 

Address: 

City: 

State: 



Month 



Am Still attending this school, 

have NOT leftl • 

Left in: 

Month 



- Jan. 
zFeb. 
~ March 
oApril 



-May 
oJune 
c July 
-Aug. 

Year 
31980 
c:l981 
-1982 



- Sept. 
30ct. 
3 Nov. 
3 Dec. 



oJan. 
oFeb. 
o March 
oApril 



wMay 
o June 
oJuly 
oAug. 

Year 

O1980 
01981 
01982 



Sept. 

:^Oct. 
j'Nov. 
. Dec. 



Am still attending this school. 

have NOT left 

Left in: 

Month 



Jan. 
Feb. 
March 
April 
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3 May 
c June 
cJuly 
-Aug. 

Year 
L 1980 
01981 

01982 



Sept. 
vOct. 
Nov. 
Dec. 



